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ENCRYPTION DEVICE USING DATA ENCRYPTION STANDARD ALGORITHM 
Field of the Invention 

5 The present invention relates to an encryption device; 

and, more particularly, to an encryption device using data 
encryption standard algorithm. 

Prior Art of the Invention 

10 

DES (Data Encryption Standard) algorithm has come to the 
more attention in this environment of the wider usage of 
networks. Especially, the DES is widely used in Internet 
security applications, remote access server, cable modem or 
15 satellite modem. 

The DES is fundamentally a 64 -bit block cipher having 64- 
bit block input and output, 5 6 bits among the 64 -bit key block 
for encryption and decryption and remaining 8 bits for parity 
checking. And, the DES outputs a 64 -bit plain text block and 
20 a 64-bit cipher text generated from the input of the 56-bit 
key . 

In a major technique, the DES is implemented by 
permutation (P-Box) , substitution (S-Box) and key schedule 
generating sub-key . 
2 5 Inside of data encryption is implemented in such a way to 

iteration of 16 round operations and constructed by an initial 
permutation (IP) of input part and an inverse initial 
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permutation (IP" 1 ) of output part. 

Fig. 1 is a detailed diagram of the cipher function and 
the S-Box permutation unit of a general DES architecture. 

Referring to Fig. 1, the cipher function f includes an 
5 expansion permutation unit 110, an exclusive-OR (XOR) unit 
120, an S-Box permutation unit 130, a P-Box permutation unit 
14 0 and an XOR unit 150. 

The expansion permutation unit 110 performs expansion 
permutation over 32 -bit data (R ( i-i)) from a right register 
10 registering 32 -bit text block to output 48-bit data. 

The XOR unit 12 0 performs XOR operation over the 4 8 -bit 
data from the expansion permutation unit 110 and a sub-key (Ki) 
from a key scheduler. 

The S-Box permutation unit 130 performs substitution over 
15 48-bit data from the XOR unit 120 to output 32-bit data. 

The P-Box permutation unit 14 0 performs permutation over 
32-bit data from the S-Box permutation unit 130. 

The XOR unit 150 performs XOR operation over 32 -bit data 
from the P-Box permutation unit 140 and 32 -bit data (L (i -i)) 
20 from a left register. 

The key scheduler includes two shift units 160 and 170 
and a compression permutation unit 180. Each of the shift 
units 160 and 170 respectively shifts corresponding 28 bits, 
half of 56-bit key data. 
25 The compression permutation unit 180 receives two blocks 

from the shift units 160 and 170 to compress them to the sub 
key . 
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In particular, the S-Box permutation unit 13 0 includes 8 
S-Boxes for receiving 48 -bit data and outputting 32 -bit data. 
That is, 48-bit data block is divided into 8 6-bit data, each 
applied to the corresponding S-Box of the 8 S-Boxes and each 
of the 8 S-Boxes outputs 4-bit data. Accordingly, 48-bit data 
is permutated to 32 -bit data. The S-Box permutation unit 13 0 
requires a memory, e.g., a programmable logic array (PLA) or a 
read only memory (ROM) , because it employs table look-up 
technique. Since each of the S-Boxes outputs 4 bits for 6- 
bit input, it requires 64 x 4 memory capability and the S-Box 
permutation unit 130 requires 8 x 64 x 4 memory capability. 
Accordingly, the S-Box permutation unit 13 0 takes relatively 
large area in a chip. 

Fig. 2 is a block diagram of a DES architecture having 4- 
15 stage pipeline structure using a 2 phases clock, which has an 
effect on processing capability and is applied to an 
embodiment of the present invention. 

Referring to Fig. 2, in the DES algorithm, 64 -bit plain 
text block undergone an IP unit is divided into two blocks, a 0 
2 0 and b 0 - The a 0 and b 0 are respectively registered at a first 
left register (AO) 290 and a first right register (B0) 200 by 
using a first clock (CLK1) and a second clock (CLK2) . 

32-bit data registered at the first right register (B0) 
200 is encrypted by the cipher function f B 210 using the sub- 
25 key (K {i) ) from the key scheduler and the encrypted 32 -bit data 
is X-ORed with the 32-bit data registered at the first left 
register (AO) 290 at the X-OR unit 220. 32-bit data from the 
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X-OR unit 220 is registered at a second left register (Al) 230 
by using a first clock (CLK1) . 

32 -bit data registered at the second left register (Al) 
230 is encrypted by the cipher function f c 240 using the sub- 
key (K (i+ i)) from the key scheduler and the encrypted 32 -bit 
data is X-ORed with the 32 -bit data registered at the first 
right register (B0) 200 at the X-OR unit 250. 32-bit data 
from the X-OR unit 2 50 is registered at a second right 
register (Bl) 260 by using the second clock (CLK2) . 

32 -bit data registered at the second right register (Bl) 
2 60 is encrypted by the cipher function f D 270 using the sub- 
key (K (i+ 2)) from the key scheduler and the encrypted 32 -bit 
data is X-ORed with the 32 -bit data registered at the second 
left register (Al) 230 at the X-OR unit 280. 32 -bit data from 
15 the X-OR unit 280 is registered at the first left register 
(AO) 290 by using the first clock (CLK1) . 

32 -bit data registered at the first left register (AO) 
2 90 is encrypted by the cipher function f A 300 using the sub- 
key (K ( i +3) ) from the key scheduler and the encrypted 32-bit 
20 data is X-ORed with the 32 -bit data registered at the second 
right register (Bl) 260 at the X-OR unit 310. 32-bit data 
from the X-OR unit 310 is registered at the first right 
register (B0) 200 by using the second clock (CLK2) . 

At a final round, 32-bit of the first left register (AO) 
25 290 becomes block b 15 and 32-bit from the X-OR unit 310 becomes 
bi S . 

The second clock (CLK2) is a delayed version of the first 
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clock (CLK1) by 1/2 period. At a rising edge of the first 
clock (CLK1) , new data are registered at the register AO and 
Al . At a rising edge of the second clock (CLK2), new data are 
registered at the register BO and Bl . 

Fig. 3 is a timing diagram for explaining operation of 
the DES architecture having the 4 -stage pipeline structure in 
Fig. 2. 

Referring to Fig. 3, 32 -bit data blocks a 0 and b 0 are 
generated by dividing initial -permuted 64 -bit plain text block 
to two 32 -bit blocks and a 0 and b 0 are respectively registered 
at registers AO and BO at t 0 of the first clock (CLK1) and ti 
of the second clock (CLK2). Computation of b x (bi = a 0 © f(b 0 , 
Ki)) is started from ti and the computed value is registered at 
the register Al at t 2 . Because the registers AO and BO are 
15 registered by the first clock (CLK1) and the second clock 
(CLK2) which are delayed from each other, a 0 registered at the 
register AO remains to t 2 so that a 0 can be used to compute b x 
at ti-t 2 period. b x is remained to t 4 so that b x can be used to 
compute b 2 at t 2 -ti period. In other words, times which the 
20 left registers register new data are t 0 , t 2 , t 4 , — # and times 
which the right registers register new data are ti, t 3/ t 5/ . 

Because b 0 registered in the register BO at t x and b x 
registered in the register Al at t 2 remains to t 2 -t 3 , b 2 (b 2 = b 0 
© f(bi, K 2 )) is computed at t 2 -t 3 period and registered at the 
25 register Bl at t 4 by the second clock (CLK2) . Computed values 
b 3 , b 7/ bn, bi5 are registered in the first left register (AO) 
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at rising edges of the first clock (CLK1) , t 4 , t 8 , t 12/ t 16/ and 
computed values b 5 , b 9 , b 13 are registered in the second left 
register (Al) at rising edges of the first clock (CLK1) , t 6/ 
tio, t 14 . Similarly, computed values b 4 , b 8/ b 12 , h 16 are 
registered in the first right register (BO) at rising edges of 
the second clock (CLK2) , t 5 , t 9 , t 13 , t 17( and computed values 
b 6 , bio, b 14 are registered in the second register (Bl) at 
rising edges of the second clock ( CLK2 ) , t 7/ tu, ti 5 . 

As described above, by accessing stored values at the 
registers simultaneously using the clock having 2 phases, the 
computation time for h lt b 2/ . . . , b 16 can be reduced to 8.5 
cycles . 

Typically, for a given key, 64 -bit plain text or cipher 
text blocks to be encrypted or decrypted are applied 

15 continuously. For example, because an encryption technique 
for use in MCNS cable modem performs encryption in unit of MAC 
frame, at most 1,518 bytes plain text blocks are encrypted by 
using an identical key. That is, 16 round DES cores should be 
computed a number of plain text blocks by using the identical 

20 key. In this case, the pipeline structure can increase the 
processing capability. 

Fig. 4 is a timing diagram for explaining operation of 
pipeline of the DES architecture having the conventional 4- 
stage pipeline structure in Fig. 2. 

25 Referring to Fig. 4, by using the pipeline structure, two 

plain text blocks can be processed during 8.5 cycles. And, 
inserting new plain text blocks c 0 and d 0 to the registers AO 
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and BO at t 2 and t 3 during a vacant period in Fig. 3, the plain 
text block di can be computed while computation of the plain 
text block bi. In order to encrypt new plain text blocks bi 
and di during every period t 0 - t lf t x - t 2/ . . . , two cipher 
5 functions are performed simultaneously for every period. The 
number of the plain text blocks that can be processed within 
8.5 cycles can be increased by two times. However, the S-Box 
forming the cipher function should be added. 

Referring to Fig. 2 again, it shows a timing diagram for 

10 explaining operations of the cipher function when the pipeline 
of the DES architecture having the conventional 4 -stage 
pipeline structure is not used and when the pipeline is used. 

In case that one 64 -bit plain text block is encrypted, 
i.e., the pipeline is not used, the cipher functions f A , f B , 

15 f c , fD can be implemented by one S-Box permutation unit because 
the computation of them are performed time-divisionally by the 
clock having 2 phases. However, because (f A/ f c ) and (f B/ fo) 
are not time divided while (f A , f B ) and (f c , fo) is timely 
divided when the two plain text blocks are encrypted 

20 simultaneously, two S-Box are required. 

Fig. 5 is a detailed block diagram of a conventional 
single port S-Box permutation unit. 

Referring to Fig. 5, conventionally, the pipeline 
operation is performed by using the two S-Box permutation 

25 units and each of the S-Box permutation units includes 8 S- 
Boxes, input and output of each S-Box being 4 8 -bit data and 
32 -bit, respectively- Each S-Box is formed by 64 x 4 ROM or 
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PLA and has a path receiving 6 -bit address and outputting 4- 
bit data. Accordingly, there are provided two physically 
separated paths, a first path and a second path, by the two S- 
Box permutation units. 

Fig. 6 is a block diagram of a DES architecture having 8- 
stage pipeline structure using a 2 phases clock, which has an 
effect on processing capability and is applied to other 
embodiments of the present invention. 

Referring to Fig. 6, in the DES algorithm, 64 -bit plain 
text block undergone an IP unit is divided into two blocks, a 0 
and b 0 - The a 0 and b 0 are respectively registered at a first 
left register (AO) 660 and a first right register (BO) 600 by 
using a first clock (CLK1) and a second clock (CLK2) . 

32-bit data registered at the first right register (BO) 
600 is encrypted by the cipher function f B 610 using the sub- 
key (K (i) ) from the key scheduler and the encrypted 32 -bit data 
is X-ORed with the 32 -bit data registered at the first left 
register (AO) 660 at the X-OR unit 620. 32-bit data from the 
X-OR unit 620 is registered at a second left register (Al) 630 
by using a first clock (CLK1) . 

32-bit data registered at the second left register (Al) 
63 0 is encrypted by the cipher function f c 640 using the sub- 
key (K (i+1 )) from the key scheduler and the encrypted 32-bit 
data is X-ORed with the 32-bit data registered at the first 
right register (B0) 600 at the X-OR unit 650. Two rounds as 
described above are iterated, at a final round, 32 -bit of the 
first left register (AO) 660 becomes block b 15 and 32-bit from 
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the X-OR unit 670 becomes b 16 . 

Al, A2, A3 and AO denote the left registers, and Bl, B2 , 
B3 and BO denote the right registers. At a rising edge of the 
first clock (CLK1) , new data are registered at the register 
AO, Al, A2 and A3. At a rising edge of the second clock 
(CLK2), new data are registered at the register BO, Bl , B2 and 
B3 . 

The second clock (CLK2) is an inverse clock and a delayed 
version of the first clock (CLK1) by 1/2 period. 

Fig. 7 is a timing diagram for explaining operation of 
the DES architecture having the 8 -stage pipeline structure in 
Fig. 6. 

Referring to Fig. 7, 32 -bit blocks a 0 and b 0 are generated 
by dividing initial -permuted 64 -bit plain text block to two 
32 -bit blocks and a 0 and b 0 are respectively registered at 
registers AO and BO at t 0 of the first clock (CLK1) and t x of 
the second clock (CLK2). Computation of b x (b x = a 0 © f(b 0 , 
Ki) ) is started from ti and the computed value is registered at 
the register CO at t 2 . Because the registers AO and BO are 
registered by the first clock (CLK1) and the second clock 
(CLK2) which are delayed from each other, a 0 registered at the 
register AO remains to t 2 so that a 0 can be used to compute b x 
at ti-t 2 period. b x is remained to t 4 so that b x can be used to 
compute b 2 at t 2 -ti period. In other words, times the second 
left register (Al) registers new data are t 0 , t 2 , t 4/ — / and 
times the first right register (BO) registers new data are t x , 

t3/ t 5 , "* . 
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Because b 0 registered in the register BO at t x and bi 
registered in the register Al at t 2 remains to t 2 -t 3/ b 2 (b 2 = b 0 
® f(bi, K 2 ) ) is computed at t 2 -t 3 period and registered at the 
register Bl at t 4 by the second clock (CLK2) . 
5 Computed values a 0/ b 7 , b i5 are registered in the first 

left register (AO) at rising edges of the first clock (CLK1) , 
to, t 8/ ti 6/ computed values bi and b 9 are registered in the 
second left register (Al) at rising edges of the first clock 
(CLK1) , t 2 and t 10 , computed values b 3 and bn are registered in 

10 the third left register (A2) at rising edges of the first 
clock (CLK1) , t 4 and t i2 , and computed values b 6 and iDi4 are 
registered in the fourth left register (A2) at rising edges of 
the first clock (CLK1) , t 5 and t i4 . 

Similarly, computed values b 0 , b 8 , b i6 are registered in 

15 the first right register (BO) at rising edges of the second 
clock (CLK2) , ti, t 9/ ti 7/ computed values b 2 , bi 0 are registered 
in the second register (Bl) at rising edges of the second 
clock (CLK2), t 3/ tn, computed values b 4 , b i2 are registered in 
the third register (B2) at rising edges of the second clock 

2 0 (CLK2) , t 5 , ti 3/ and computed values h 6f b 14 are registered in 
the fourth register (B2) at rising edges of the second clock 
(CLK2), t 7 , tis. 

Fig. 8 is a timing diagram for explaining operation of 
pipeline of the DES architecture having the 8-stage pipeline 
25 structure in Fig. 6. 

Referring to Fig. 6, by using the pipeline structure, four 
plain text blocks can be processed during 8.5 cycles. And, 
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inserting new plain text blocks c 0 and d 0 to the registers AO 
and BO at t 2 and t 3 , e 0 and f 0 at t 4 and t 5 , g 0 and h 0 at t 6 and 
t 7 , during a vacant period in Fig. 1, the plain text block di, 
fi, hi can be computed while computation of the plain text 
5 block bi . In order to encrypt new plain text blocks bi, di, fi 
and hi during every period t 0 - ti, t x - t 2 , t 2 - t 3 , . . . , four 
cipher functions are performed simultaneously for every 
period. The number of the plain text blocks that can be 
processed within 8.5 cycles can be increased by four times. 

10 However, three S-Box permutation units should be added. 

Referring to Fig. 9, it shows a timing diagram for 
explaining operations of the cipher function when the pipeline 
of the DES architecture having the 8 -stage pipeline structure 
is not used and when the pipeline is used. 

15 In case that one 64 -bit plain text block is encrypted, 

i.e., the pipeline is not used, the cipher functions f A , f B/ 
f c , f D/ f E/ f F/ f G/ f H can be implemented by one S-Box 
permutation unit because the computation of them are performed 
time-divisionally by the clock having 2 phases. However, 

20 because (f A , f c , f E/ f G ) and (f B , f D , f P , f H ) are not time 
divided while (f A/ f B , fc/ £v) and (f E , f F , f G , fi*) is timely 
divided when the four plain text blocks are encrypted 
simultaneously, four S -Boxes are required. 

Fig. 10 is a detailed block diagram of a conventional 

25 single port S-Box permutation unit. 

Referring to Fig. 10, conventionally, the pipeline 
operation is performed by using the four S-Box permutation 
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units and each of the S-Box permutation units includes 8 S- 
Boxes, input and output of each S-Box being 48 -bit data and 
32 -bit, respectively. Each S-Box is formed by 64 x 4 ROM or 
PLA and has a path receiving 6 -bit address and outputting 4- 
5 bit data. Accordingly, there are provided four physically 
separated paths , a first path, a second path, a third path and 
a fourth path, by the four S-Box permutation units. 

As described above, conventionally, a problem of an 
access to the memory required for the S-Box permutation unit, 
10 i.e., a data contention problem, is solved by the two 
physically separated paths of the two S-Box permutation units. 
However, since the two identical S-Box permutation units are 
used, area is increased. 

15 Summary of the Invention 

Therefore, it is an object of the present invention to 
provide an encryption device eliminating data contention and 
minimizing area that can access data multiple times within a 
2 0 given time. 

It is another object of the present invention to provide 
an encryption device reducing a chip size and increasing its 
performance . 

In accordance with an aspect of the present invention, 
25 there is provided an encryption device for performing 
encryption of plain text blocks using data encryption standard 
algorithm, wherein the encryption device includes an initial 
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permutation unit, a data encryption unit having n-stage (n is 
an even number equal to or larger than four) pipeline 
structure using a first clock and a second clock and an 
inverse initial permutation unit, the encryption device 
5 comprising: a multiplexer for selecting one of a half of n 48- 
bit inputs; 8 S -Boxes, each for receiving 6 -bit address among 
the selected 48 -bit and outputting 4 -bit data; a demultiplexer 
for distributing 4 -bit data from each of the S-Boxes to the 
half of n outputs; and a controller for control the 
10 multiplexer and the demultiplexer with the first clock and the 
second clock. 

In accordance with another aspect of the present 
invention, there is provided an encryption device for 
performing encryption of plain text blocks using data 

15 encryption standard algorithm, wherein the encryption device 
includes an initial permutation unit, a data encryption unit 
having 8 -stage pipeline structure using a first clock and a 
second clock and an inverse initial permutation unit, the 
encryption device comprising: a first multiplexer for 

2 0 selecting one of a first and a second 48 -bit inputs; a first 
S-Box unit having 8 S-Boxes, each S-Box for receiving 6-bit 
address among selected 48 -bit from the first multiplexer and 
outputting 4-bit data; a first demultiplexer for distributing 
4 -bit data from each of the S-Boxes to one of a first and a 

25 second outputs; a first controller for controlling the first 
multiplexer and the first demultiplexer with a third clock and 
a fourth clock; a second multiplexer for selecting one of a 
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third and fourth 48 -bit inputs; a second S-Box unit having 8 
S-Boxes, each S-Box for receiving 6-bit address among selected 
48-bit from the second multiplexer and outputting 4-bit data; 
a second demultiplexer for distributing 4 -bit data from each 
5 of the S-Boxes to one of a third and a fourth outputs; and a 
second controller for controlling the second multiplexer and 
the second demultiplexer with the third clock and the fourth 
clock, wherein the third and the fourth clocks are faster than 
the first and the second clocks by two times. 

10 

Brief Des cription of th e Drawings 

The above and other objects and features of the instant 
invention will become apparent from the following description 
15 of preferred embodiments taken in conjunction with the 
accompanying drawings, in which: 

Fig. 1 is a cipher function and a S-Box permutation unit 
having a general DES architecture ; 

Fig. 2 is a block diagram of DES architecture having 4- 
20 stage pipeline structure using a 2 phases clock, which has an 
effect on processing capability and is applied to an 
embodiment of the present invention; 

Fig. 3 is a timing diagram for explaining operation of 
the DES architecture having the 4-stage pipeline structure in 
2 5 Fig. 2; 

Fig. 4 is a timing diagram for explaining operation of 
pipeline of the DES architecture having the 4-stage pipeline 
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structure in Fig. 2 ; 

Fig. 5 is a detailed block diagram of a conventional 
single port S-Box permutation unit; 

Fig. 6 is a block diagram of DES architecture having 8- 
5 stage pipeline structure using a 2 phases clock, which has an 
effect on processing capability and is applied to other 
embodiment of the present invention; 

Fig. 7 is a timing diagram for explaining operation of 
the DES architecture having the 8 -stage pipeline structure in 
10 Fig. 6; 

Fig. 8 is a timing diagram for explaining operation of 
pipeline of the DES architecture having the 8 -stage pipeline 
structure in Fig. 6; 

Fig. 9 is a diagram illustrating a timing diagram for 
15 explaining operations of the cipher function when the pipeline 
of the DES architecture having the 8 -stage pipeline structure 
is not used and when the pipeline is used; 

Fig. 10 is a block diagram of a conventional single port 
S-Box permutation unit; 
20 Fig. 11 is a detailed block diagram of 2 -port S-Box 

permutation in accordance with an embodiment of the present 
invention ; 

Fig. 12 is a timing diagram for explaining operation of 
the conventional single port S-Box permutation unit and the 2- 
25 port S-Box permutation unit of the present invention; 

Fig. 13 is a detailed block diagram of 4 -port S-Box 
permutation in accordance with another embodiment of the 
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present invention ; 

Fig. 14 is a timing diagram for explaining operation of 
the conventional single port S-Box permutation unit and the 4- 
port S-Box permutation unit of the present invention; 
5 Fig. 15 is a detailed block diagram of two 2-port S-Box 

permutation in accordance with further another embodiment of 
the present invention; and 

Fig. 16 is a timing diagram for explaining operation of 
the conventional single port S-Box permutation unit and two 2- 
10 port S-Box permutation unit of the present invention. 

Preferred Embod iment of the Invention 

Hereinafter, preferred embodiments of the present 
15 invention will be described in detail with reference to the 
accompanying drawings . 

Embodiment 1 

20 Fig. 11 is a detailed block diagram of 2-port S-Box 

permutation in accordance with the present invention. 

Referring to Fig. 11, a S-Box permutation unit includes a 
multiplexer 1110, 8 S-Boxes 1120, a demultiplexer 1130 and a 
controller 1140. The multiplexer 1110 selects one of two 48- 

25 bit inputs under control of the controller 1140. Each of the 
S-Boxes 1120 receives 6-bit address among the selected 48-bit 
and outputs 4 -bit data. The demultiplexer 113 0 distributes 
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the 4 -bit data from each of the S -Boxes 112 0 to two outputs 
under control of the controller 1140. The controller 1140 
controls the multiplexer 1110 and the demultiplexer 1130 with 
a first clock (CLK_A) and a second clock (CLK_B) . 
5 Fig. 12 is a timing diagram for explaining operation of 

the conventional single port S-Box permutation unit and the 2- 
port S-Box permutation unit. 

Referring to Fig. 12, in the present invention, signals 
required to access ROM are generated by using the first clock 

10 (CLK_A) and the second clock (CLK__B) that are faster than 
input clocks (CLK_1, CLK_2 ) by two times. The data contention 
problem is eliminated since there exist a first path (pathl) 
and a second path (path2) those are timely divided by the 
multiplexer selecting one of the first path (pathl) and the 

15 second path (path2) at each time period t±~ t i+1 . That is, when 
the first clock (CLK__A) is logic high, the first path (pathl) 
is selected and bi are computed and when the second clock 
(CLK__B) is logic high, the second path (path2) is selected and 
di are computed. 

2 0 As described above, by using only one S-Box, the present 

invention can reduce area of the S-Box permutation unit to a 
half so that circuits can be efficiently disposed, i.e., the 
number of net die is increased in smaller chip area so that 
cost is decreased. 

25 

Embodiment 2 
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Fig. 13 is a detailed block diagram of 4 -port S-Box 
permutation in accordance with another embodiment of the 
present invention . 

Referring to Fig. 13, a S-Box permutation unit includes a 
5 multiplexer 1310, 8 S-Boxes 1320, a demultiplexer 1330 and a 
controller 1340. The multiplexer 1310 selects one of four 48- 
bit inputs under control of the controller 1340. Each of the 
S-Boxes 1320 receives 6-bit address among the selected 48-bit 
and outputs 4-bit data. The demultiplexer 1330 distributes 
10 the 4-bit data from each of the S-Boxes 1320 to two outputs 
under control of the controller 1340. The controller 1340 
controls the multiplexer 1310 and the demultiplexer 1330 with 
a first clock (CLK__A) and a second clock (CLK_B) . 

Fig. 14 is a timing diagram for explaining operation of 
15 the conventional single port S-Box permutation unit and the 2- 
port S-Box permutation unit. 

Referring to Fig. 14 , in the present invention, signals 
required to access ROM are generated by using the first clock 
(CLK_A) and the second clock (CLK_B) that are faster than 
2 0 input clocks (CLK_1, CLK_2) by four times. The data contention 
problem is eliminated since there exist a first path (pathl) , 
a second path (path2) , a third path (path3) and a fourth path 
(path4) those are timely divided by the multiplexer selecting 
one of the first path (pathl) , the second path (path2) , the 
25 third path (path3) and the fourth path (path4) at each time 
period ti- ti+i- The controller generates signals necessary to 
access the ROM based on the first and the second clock (CLK_A, 
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CLK_B) . 

As described above, by using only one S-Box, the present 
invention can reduce area of the S-Box permutation unit to 1/4 
so that circuits can be efficiently disposed, i.e., the number 
5 of net die is increased in smaller chip area so that cost is 
decreased . 

The S-box in accordance with this embodiment has smaller 
size than the conventional S-box illustrated in Fig. 10. 
However, access rate of the S-box in this embodiment is slower 
10 than that of the S-box illustrated in Fig. 10. 

Embodiment 3 

In this embodiment, when the S-Box cannot be implemented 
15 by using faster storage device by four times, a S-Box 
permutation unit is implemented by two 2 -port S -Boxes by using 
storage device two times" faster than that of the S-box 
illustrated in Fig. 10. 

Referring to Fig. 15, each of two S-Box permutations 
20 unit includes a multiplexer 1510 or 1550, 8 S-Boxes 1520 or 
1560, a demultiplexer 1530 or 1570, and a controller 1540 or 
1580. A first multiplexer 1510 selects one of two 48-bit 
inputs under control of the controller 1540. Each of first S- 
Boxes 1520 receives 6-bit address among the selected 48-bit 
25 and outputs 4-bit data. A first demultiplexer 1530 

distributes the 4-bit data from each of the S-Boxes 1520 to 
two outputs under control of the controller 1540. The 
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controller 1540 controls the multiplexer 1510 and the 
demultiplexer 1530 with a first clock (CLK__A) and a second 
clock ( CLK_J3 ) . A second multiplexer 1550 selects one of two 
48-bit inputs under control of the controller 1580. Each of 
5 second S-Boxes 1560 receives 6-bit address among the selected 
48-bit and outputs 4-bit data. A second demultiplexer 1570 
distributes the 4 -bit data from each of the S-Boxes 1560 to 
two outputs under control of the controller 1580. The 
controller 1580 controls the multiplexer 1550 and the 
10 demultiplexer 1570 with a first clock (CLK_A) and a second 
clock (CLK_B) . 

Fig. 16 is a timing diagram for explaining operation of 
the conventional single port S-Box permutation unit and the 2- 
port S-Box permutation unit. 

15 Referring to Fig. 16, in the present invention, signals 

required to access ROM are generated by using the first clock 
(CLK_A) and the second clock (CLK_B) that are faster than 
input clocks by two times. The data contention problem is 
eliminated since there exist a first path (pathl) and a second 

20 path (path2) those are timely divided by the multiplexer 
selecting one of the first path (pathl) and the second path 
(path2) at each time period ti- t i+1 . That is, when the first 
clock (CLK__A) is logic high, the first path (pathl) and the 
third path (path3) are selected and b± and f± are computed and 

25 when the second clock (CLK_B) is logic high, the second path 
(path2) and the fourth path (path4) are selected and d± and hi 
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are computed. 

While the present invention has been shown and described 
with respect to the particular embodiments, it will be 
apparent to those skilled in the art that many changes and 
modifications may be made without departing from the spirit 
and scope of the invention as defined in the appended claims. 
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